This tutorial is based on material by Dr. Med Ramon Saccilotto, Departement of Clinical Research, University Hospital Basel, Switzerland.
In essence ggplot2 is a collection of functions that will help you to create maintainable and publication ready plots in an efficient manner.
The package was developed by Hadley Wickham, assistant professor of statistics at Rice University, New Zealand and is currently maintained by the author himself and a number of volunteers on github.
Descriptions and examples of almost all of the packages functions can be found in the fantastic online documentation.
If you are using ggplot on a regular basis, I highly recommend to read at least one of the following books for further information:
«ggplot2 is an R package for producing statistical, or data, graphics, but it is unlike most other graphics packages because it has a deep underlying grammar.»
«This grammar, based on the Grammar of Graphics (Wilkinson, 2005), is composed of a set of independent components that can be composed in many different ways. [..]»
«Plots can be built up iteratively and edited later.»
«A carefuly chosen set of defaults means that most of the time you can produce a publication-quality graphic in seconds, but if you do have speical formatting requirements, a comprehensive theming system makes it easy to do what you want. [..]»
«ggplot2 is designed to work in a layered fashion, starting with a layer showing the raw data then adding layers of annotation and statistical summaries. [..]»
«ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and latticegraphics and none of the bad parts.»
«It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.»
This tutorial is intended to give you a brief overview of the amazing ggplot2 package (currently at version 1.0.1).
After completion you should be able to quickly create a variety of different plots and have the necessary understanding to adapt the plots to your specific needs.
Please note however, that we will not cover all the functionality of the ggplot2 package.
More information and examples of almost all functions and plot types can be found in the online documentation
Your fist step should be to install and load the ggplot2 package.
# first we need to install the ggplot and some supporting libraries
# (skip this step if the library is already loaded)
install.packages("ggplot2")
# we will require the ggplot2 package for our graphics
# note: there are some additional useful packages such as plyr,
# reshape2 and scales which you may find useful
require("ggplot2")
require(RColorBrewer)
# prices of 50000 sparkly round cut diamonds
head(diamonds)
## # A tibble: 6 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
?diamonds
# motor trend car road tests
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# vapor pressure of mercury at certain temperatures
head(pressure)
## temperature pressure
## 1 0 0.0002
## 2 20 0.0012
## 3 40 0.0060
## 4 60 0.0300
## 5 80 0.0900
## 6 100 0.2700
Therefore, ggplot is not «really» compatible with the default plot-functions
Plots are drawn in layers that are stacked on top of each other
To create a plot we either use the qplot() or ggplot() function
However, once you understand the underlying principle, the ggplot syntax is easy to comprehend and very well suited for more complex plots.
For a nice read on this, see http://byrneslab.net/classes/biol607/readings/wickham_layered-grammar.pdf
str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
# a basic x-y plot
ggplot(data=diamonds,aes(x=depth,y=table))+geom_point()
#### here’s an additional aesthetic, now included: we map something to color
# now with color
ggplot(data=diamonds,aes(x=depth,y=table,color=cut))+geom_bar() # some combinations don't work!
# histogram with ggplot
ggplot(diamonds, aes(x=clarity, fill=cut)) + geom_bar()
Here, the data is ‘diamonds’. The aesthetic choices are that clarity is on the x axis, the count is on the y axis( so we didn’t specify another variable to be y), and we show the ‘cut’ subgroups.
# histogram with qplot: same thing as above. qplot ( x, y , data, ..) guesses some of the things you need.
qplot(clarity, data=diamonds, fill=cut, geom="bar")
Important note: ggplot (and qplot for that matter) always expect the data to be in a data.frame
The syntax for qplot is qplot(«x-axis», «y-axis», data=«data.frame», ..)
The ggplot2 package has an extendable fortify method which can be used to convert R-objects to data-frames for plotting.
Some useful fortify methods are already availabe in the package. More can be found in the ggfortify package on github.
# quickly create a scatterplot of our data using qplot ( x, y , data )
qplot(wt, mpg, data=mtcars) # wt is going to be on the x axis, mpg on the y axis.
# data can be transformed with functions
qplot(log(wt), mpg-10, data=mtcars)
# plots can be further refined by using additional parameters
# note: we are mapping the variable «qsec» to a color
qplot(wt, mpg, data=mtcars, color=qsec)
Plot-attributes such as color, point, shape, etc. are called aesthetics.
With qplot(), assigning a variable to an aesthetic will map the values of the variable into the value-space of the aesthetic.
Note: Both the american «color» and british «colour» are supported in most cases.
# color and colour will work for most cases
qplot(wt, mpg, data=mtcars, color=qsec)
# note: colour instead of color
qplot(wt, mpg, data=mtcars, colour=qsec)
# note: in this example ggplot is trying to map the size of a point
# to a scale of [10] (which is probably not as intended)
qplot(wt, mpg, data=mtcars, color=qsec, size=10)
# use the I() function «as is» to set aesthetics instead of mapping
qplot(wt, mpg, data=mtcars, color=qsec, size=I(10))
# side note: it is possible to use alpha-blending for overlapping elements
qplot(wt, mpg, data=mtcars, color=qsec, size=I(10), alpha=qsec)
# note: alpha-opacity is set between 0 (transparent) and 1 (opaque)
qplot(wt, mpg, data=mtcars, color=qsec, size=I(10), alpha=I(0.5))
# we take a closer look at the variable cyl from the dataset mtcars
# note: the variable is stored as a continuous number not as a factor
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
summary(mtcars$cyl)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.000 4.000 6.000 6.188 8.000 8.000
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14
# regular numeric variables will be mapped to a continuous scale
qplot(wt, mpg, data=mtcars, color=cyl)
# factored variables will be displayed with a discrete scale
qplot(wt, mpg, data=mtcars, color=factor(cyl))
# ggplot will try to guess the «correct» plot for your data
qplot(wt, mpg, data=mtcars)
qplot(factor(cyl), data=mtcars)
# a specific type of plot can be set with the attribute geom=«type»
qplot(wt, mpg, data=mtcars, geom="point")
qplot(wt, mpg, data=mtcars, geom="line")
# plot-types can be combined
qplot(wt, mpg, data=mtcars, geom=c("line", "point"))
# note: problem if only size of points should be increased
qplot(wt, mpg, data=mtcars, geom=c("line", "point"), size=I(2))
# pro-tipp: resort to ggplot syntax (more on that later)
qplot(wt, mpg, data=mtcars) + geom_line() + geom_point(size=4)
# a plot can be flipped by 90°
# note: coord_flip() will rotate the plot after calculation of
# any summary statistics (i.e. smoothers or alike)
qplot(factor(cyl), data=mtcars)
qplot(factor(cyl), data=mtcars) + coord_flip()
# difference between fill/color bars
qplot(factor(cyl), data=mtcars, fill=factor(cyl))
qplot(factor(cyl), data=mtcars, color=factor(cyl))
# use different position properties for bars (stacked, dodged, fill, identity)
head(diamonds)
## # A tibble: 6 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="stack")
## Warning: `position` is deprecated
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="dodge")
## Warning: `position` is deprecated
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="fill")
## Warning: `position` is deprecated
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="identity")
## Warning: `position` is deprecated
The ggplot syntax is used to build a plot layer by layer. Usually, the following steps are involved
ggplot(«data.frame»)geom_point() or geom_line()aes()Note: The underlying data is always the same for all layers - although there is a workaround that you should only rarely use
# we are going to use some pressure data
## head(pressure)
# nothing happens if we only define our data
ggplot(pressure)
# but we can quickly add a representation
# note: the aes() function is used for variable mapping
ggplot(pressure) + geom_point(aes(x=temperature, y=pressure))
# as x and y are used so often, we can leave it of
# note: for later maintenance it is usually better to specify it
ggplot(pressure) + geom_point(aes(temperature, pressure))
# note: you can access the previously created plot with «last_plot()»
last_plot()
# specify a value allocation outside of the aes() function
# if an aestetic should be set to a specific value
ggplot(pressure) + geom_point(aes(temperature, pressure), size=4)
# aesthetics can also be defined separately
ggplot(pressure) + aes(temperature, pressure) + geom_point(size=4)
# create some normal distributed test data
tmp <- data.frame(x=rnorm(4000), y=rnorm(4000))
p.myplot <- ggplot(tmp, aes(x,y))
# default plotting
p.myplot + geom_point(color="red")
# plotting using hollow circles
p.myplot + geom_point(shape=1, color="red")
# plotting using pixels
p.myplot + geom_point(shape=".", color="red")
# plotting using alpha transparency
# note: requires the scales package (included with ggplot2)
p.myplot + geom_point(color=scales::alpha("red", 1/2))
p.myplot + geom_point(color=scales::alpha("red", 1/6))
# ggplot will actually return an object that can be modified
# note: the object can also be saved for later use with save()
# saving a plot or layer definitions will also include the plot data
p.myplot <- ggplot(pressure)
# summary information about the plot
summary(p.myplot)
## data: temperature, pressure [19x2]
## faceting: <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
# adding some additional layers
p.myplot <- p.myplot + aes(temperature, pressure) + geom_point(size=4)
summary(p.myplot)
## data: temperature, pressure [19x2]
## mapping: x = ~temperature, y = ~pressure
## faceting: <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## -----------------------------------
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
# the plot can be printed by just calling the object or using print()
p.myplot
print(p.myplot)
# the underlying data is saved within the ggplot-object. modifications of
# the data will not alter the plot if the plot-code is not rerun.
# there is however a special syntax to run the plot with updated data
pressure2 <- data.frame(
"temperature"=pressure$temperature, "pressure"=log(pressure$pressure))
# print the plot with updated data
p.myplot %+% pressure2
# a plot can be exported using ggsave
# note: the respective rendering device needs to be installed
ggsave(file="testplot.pdf", plot=p.myplot, width=10, height=5)
# ggsave(file="testplot.svg", plot=p.myplot, width=10, height=5)
ggsave(file="testplot.png", plot=p.myplot, dpi=72, width=10, height=5)
# let's define a base plot and aesthetic-mapping
p.myplot <- ggplot(pressure) + aes(x=temperature, y=pressure)
# using multiple layers
p.myplot +
geom_point(color="purple3", size=6) +
geom_line(color="steelblue2", size=2)
# the order of the layers does mather
# (each new layer is drawn on top of the previous)
p.myplot +
geom_line(color="steelblue", size=2) +
geom_point(color="purple3", size=6)
# aesthetics defined in the base layer will be used for all layers
# note: setting attributes to a value will not apply it to other layers
ggplot(pressure, aes(x=temperature, y=pressure), color="red") +
geom_line(size=4, alpha=0.3) +
geom_point(size=4)
# the actual arguments to map variables is mapping=«aes()» and
# geom_params=«list()» to set variables respectively
ggplot(pressure) +
geom_point(
mapping=aes(x=temperature, y=pressure, color=factor(temperature)),
geom_params=list(size=4, shape=18)
)
## Warning: Ignoring unknown parameters: geom_params
qplot() ggplot() geom_«type»() layer()
# it is possible to mix qplot and ggplot
qplot(temperature, pressure, data=pressure, geom="line", lty=I("dashed")) +
geom_point(size=4)
Scales are required to give the plot reader a sense of reference and thus encompass the ideas of both axes and legends on plots.
ggplot2 will usually automatically chose appropriate scales and display legends if necessary.
It is however, very easy to override or modify the default values.
The scale syntax is scale_«attribute»_«optional subspecification»() i.e. scale_x_continuous() or scale_x_discrete()
# setup our plot with default scales
p.myplot <- ggplot(pressure) +
aes(temperature, pressure, color=factor(temperature)) +
geom_point(size=4)
# scales can be limited to a certain range
p.myplot
p.myplot + scale_x_continuous("Temperature", limits=c(200, 400))
## Warning: Removed 10 rows containing missing values (geom_point).
# scales that are used as axes will take the name as axis label
p.myplot +
scale_color_discrete(name="Temperature \nin C°") +
scale_y_continuous(name="Air pressure at sea level")
# legends can also be removed (if not important to understand the plot)
p.myplot + scale_color_discrete(guide="none")
# setup a different plot
p.myplot <- ggplot(diamonds, aes(cut, fill=color)) + geom_bar()
p.myplot
# the axis can be renamed using two different methods
p.myplot + xlab("Diamond Cut")
p.myplot + scale_x_discrete(name="Diamond Cut Description")
p.myplot + scale_y_continuous(name="Number of Diamonds")
# names of legends can also be set
p.myplot + scale_fill_discrete(name="Diamond Color")
# using some custom colors
# note: brewer colors were created for good readable maps and often provide
# a good alternative to the standard colors. to see all available brewer
# palettes use «RColorBrewer::display.brewer.all()»
p.myplot + scale_fill_grey()
p.myplot + scale_fill_hue()
p.myplot + scale_fill_brewer()
# p.myplot + scale_fill_brewer(type="seq", palette="3")
# p.myplot + scale_fill_brewer(palette="Paired")
# using a custom color palette with specified order
# note: color values should be specified as hex or color names
p.myplot + aes(fill=cut) + scale_fill_manual(
values = c("#7fc6bc","#083642","#b1df01",
"#cdef9c","#466b5d", "#744db5", "#ccb2e8"))
# using predefinded colors for specific values
# note: values that are not present in the data will not be shown
p.myplot + aes(fill=cut) + scale_fill_manual(
values = c("Fair"="#083642", "Good"="#466b5d",
"Very Good"="#7fc6bc","Premium"="#cdef9c",
"Ideal"="#b1df01", "Not specified"="#ffffff"))
# removing values from the legend and custom labelling of values
# note: you must specify colors for all existing values
p.myplot + scale_fill_manual(
name="Colors",
values = c("D"="#083642", "E"="#466b5d", "F"="#7fc6bc","G"="#cdef9c",
"H"="#b1df01", "I"="#ababab", "J"="#ececec"),
breaks = c("D", "E", "F"),
labels = c("E"="Dark Green", "D"="Esmerald", "F"="Wood"))
# legends can also be styled using guides
# note: guides can be defined once and be easily applied to multiple plots
p.mylegend <- guide_legend(
title="Color of the \nDiamond",
title.position="top",
direction="horizontal",
label.position="top",
label.hjust=0.5,
label.vjust=0.5,
ncol=2,
byrow=TRUE,
)
# apply some styling to the legend
p.myplot + guides(fill = p.mylegend)
p.myplot + scale_fill_discrete(guide=p.mylegend)
# handling problems with alpha transparency
p.myplot + aes(alpha=color)
# remove the alpha transparency for the legend
p.myplot + aes(alpha=color) +
guides(fill = guide_legend( override.aes=list(alpha=1) ))
# limiting scales will remove all points that are outside of the scale
# note: be careful, this is not the same as just focusing on a graph region
p.myplot + scale_y_continuous(limits=c(0,15000))
## Warning: Removed 2 rows containing missing values (geom_bar).
# to focus on a specific region, the coord_cartesian() function
# should be used with the specified limits
p.myplot + coord_cartesian(ylim=c(0,15000))
Some graphical representations do not use the raw data directly, but perform a statistical transformation - i.e. binning.
Several transformations are included in the ggplot2 package and can be called with the stat_«transformation»() functions.
Examples are stat_bin, stat_boxplot, stat_qq, stat_unique, stat_smooth, stat_summary and more
# histograms will use stat_bin to calculate number of items per bin
ggplot(mtcars) + aes(qsec) + geom_histogram(binwidth=0.5)
ggplot(mtcars) + aes(qsec) + geom_histogram(binwidth=1)
# define a base plot to illustrate smoothed lines
p.myplot <- ggplot(mtcars) + aes(x=disp, y=mpg) + geom_point(size=4)
p.myplot
# draw a smooth line (local regression function) through the points
# note: the default smoothing function is loess
p.myplot + geom_line(stat="smooth")
## Warning: Computation failed in `stat_smooth()`:
## object 'auto' of mode 'function' was not found
# using the smooth geom with standard deviation
p.myplot + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
# fit the regression closer to the data with span=«0-1»
p.myplot + geom_smooth(span=0.4)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
p.myplot + geom_smooth(span=1)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
# turning off the confidence interval
# note: the attribute level can be used to set ci-level
p.myplot + geom_smooth(se=FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
# using a different method for smoothing (i.e. linear modelling)
p.myplot + geom_smooth(method="lm")
# using a cutom formular for fitting
library(splines)
p.myplot + geom_smooth(method="lm", formula = y ~ ns(x,5))
# be careful when flippling a plot
# note: details on transformations on the following slide
p.myplot + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
p.myplot + geom_smooth() + coord_flip()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
p.myplot + aes(x=mpg, y=disp) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Values can be transformed either before or after any stats-functions are applied.
Use scale transformations to apply a transformation before any stats-function are applied.
Use coordinate transformations to apply a transformation after all stats- functions are applied (note: coord_flip() is a coordinate transformation).
# define a base plot to illustrate transformation
p.myplot <- ggplot(mtcars) + aes(x=disp, y=mpg) + geom_point(size=4) +
geom_smooth(method="lm", se=FALSE)
# take a look at linear regression plot
p.myplot
# apply a logarhithmic transformation
p.myplot + scale_x_continuous(trans="log", name="log(disp)")
# apply a log-transformation on the y-axis, add a linear regression and
# transform the display of the scale back with exponentation
p.myplot + scale_x_continuous(trans="log") +
coord_trans(x="exp") +
xlab("exp(log(disp)) = disp")
# adjust the y-scale breaks to match our original non transformed plot
p.myplot + scale_x_continuous(trans="log", breaks=seq(100,400,100)) +
coord_trans(x="exp") +
xlab("exp(log(disp)) = disp")
The ggplot2 package provides two interesting functionalities to look at subgroups in your data.
The group aesthetic is useful if there are only two or three groups and there is a summary statistic that should be calculated per group and displayed in one chart.
Facettings on the other hand is useful to split the data into different groups that are displayed next to each other.
# split data to create frequency polygon for each subgroup
qplot(clarity, data=diamonds, geom="bar", fill=cut)
# the same thing in ggplot:
ggplot(data=diamonds,aes(x=clarity, fill=cut))+geom_bar()
# split the data by a variable and calculate a regression for each group
ggplot(mtcars, aes(x=disp, y=mpg, color=factor(am))) + geom_point(size=4) +
geom_smooth(aes(group=factor(am)), method="lm", se=FALSE, lty="dashed")
# use facets to split the data
p.myplot <- ggplot(mtcars) +
aes(x=disp, y=mpg, color=factor(am)) +
geom_point(size=4) +
geom_smooth(method="lm", se=FALSE, lty="dashed")
# facet_wrap will wrap the specified panels
p.myplot + facet_wrap(~ am, nrow=1)
p.myplot + facet_wrap(~ am, ncol=1)
# per default the scales of the different panels will match
# it is however possible to use adaptive panes
# note: more options can be found in the documentation
p.myplot + facet_wrap(~ am, nrow=1, scales="free")
# facet_grid can be used to split by two variables
p.myplot + facet_grid(cyl ~ am)
# it is even possible to add margin calculations
p.myplot + facet_grid(cyl ~ am, margins=TRUE)
Annotations can be added as separate layers that will not influence the scales or legends
# setup plot to illustrate annotations
p.myplot = ggplot(mtcars, aes(x = wt, y = mpg))
# plot without annotations
p.myplot + geom_point(size=4, color="purple3")
# a plot with some simple annotations
p.myplot +
annotate("rect",
fill="lightsteelblue", alpha=0.4,
xmin=3, xmax=4, ymin=12, ymax=20.5) +
annotate("segment",
size = 1, color="steelblue",
arrow = grid::arrow(length=grid::unit(1, "char")),
x=4.73, y=30.5, xend=3.8, yend=21) +
annotate("text", label="A custom region",
x=4.32, y=31.2, hjust=0, vjust=0, color="steelblue", size=6) +
geom_point(size=4, color="purple3")
Unfortunately, it is currently not possible to paint textures or patterns as backgrounds with the ggplot2 package.
However, we may easily define our own pattern creating function.
# we create a function, that will calculate the coordinates for stripes that
# are contained to the given rect coordinates
# note: this involves some trigonometry and is outside
# the scoope of this tutorial
stripesInRect <- function(angle=45, distance=0.5, xmin=0, xmax=10, ymin=0, ymax=10) {
# this function will calculate a data.frame of vectors for a
# stripped background in a rectangular area
# convert angle from degree to radians
radians <- (pi / 180) * angle
# calculate the tangens
tangens <- tan(radians)
# calculate height und width of the clippling box
height <- ymax - ymin
width <- xmax - xmin
# calculate the horizontal distance of the lines
horizontalDistance = distance / tangens
# calculate the difference of start-y to end-y for full width
verticalDifference <- tangens * width
# steps for the height and width
stepsHeight = seq(from = ymin, to = ymax, by = distance)
stepsWidth = seq(from = xmin, to = xmax, by = horizontalDistance)
# initialize a data frame of coordinates
# note: distance is used for distance of lines when cutting
# through the side of the box
# note: we have to remove the first step from the widthsteps
# to avoid a duplicated start line
data <- data.frame(
"x1" = c(rep(xmin, times = length(stepsHeight)), stepsWidth[-1]),
"y1" = c(stepsHeight, rep(ymin, length(stepsWidth))[-1] ))
# define a function to calculate the endpoints
calculateEndpoint <- function(x1, y1) {
# calculate the maximal available width for the x range
availableWidthRange <- xmax - x1
if (availableWidthRange >= width) {
# calculation of lines that start from the left side
# calculate the maximal available height for the y range
availableHeightRange <- ymax - y1
# we are done if the vertical-side fits into the rect
if (availableHeightRange >= verticalDifference) {
return(c(
"x2" = xmax,
"y2" = y1 + verticalDifference))
}
# otherwise we have to adapt to the available height
horizontalDifference <- availableHeightRange / tangens
return(c(
"x2" = x1 + horizontalDifference,
"y2" = ymax))
} else {
# calculation of lines that start from the bottom side
# calculate the vertical difference
verticalDifference <- availableWidthRange * tangens
return(c(
"x2" = xmax,
"y2" = y1 + verticalDifference
))
}
}
# calculate the endpoints
endpoints <- mapply(calculateEndpoint, data$x1, data$y1)
# extract the endpoint coordinates
data$x2 <- endpoints[1,]
data$y2 <- endpoints[2,]
return(data)
}
# calculate the pattern coordinates for our plot
pattern <- stripesInRect(angle=80, distance=0.25,
xmin=3, xmax=4, ymin=12, ymax=20.5)
# create the plot with a striped background for the annotation
# note: annotation aesthetics are not mapped but will be processed as vectors
p.myplot +
annotate("segment",
size = 0.5, color="deeppink", alpha=0.25,
x=pattern$x1, y = pattern$y1,
xend = pattern$x2, yend = pattern$y2) +
annotate("segment",
size = 1, color="deeppink",
arrow = grid::arrow(length=grid::unit(1, "char")),
x=4.73, y=30.5, xend=3.8, yend=21) +
annotate("text", label="A custom region",
x=4.32, y=31.2, hjust=0, vjust=0, color="deeppink", size=6) +
geom_point(size=4, color="purple3")
The ggplot2 package separates the data-part of the plot from the non-data part.
The collection of graphical parameters used to control non-data elements - such as the background-color, font-sizes, .. - are controlled in so called themes
It is possible to create custom themes that can be applied to multiple plots easily.
Use the function theme_set(«theme») to set a global theme for all plots.
# define our plot to illustrate theming
p.myplot <- ggplot(pressure) +
aes(temperature, pressure, color=factor(temperature)) +
geom_point(size=4)
# plotting using the default theme
p.myplot
# plotting using a black & white theme
# note: the theme does not change the aesthetics controlled by data
p.myplot + theme_bw()
# modifiying specific elements of a theme
# note: more options can be found in the documentation
# theme modifications may require some understanding of the grid-package
p.myplot + theme(
legend.position="top",
legend.margin=grid::unit(1, "cm"))
## Warning: `legend.margin` must be specified using `margin()`. For the old
## behavior use legend.spacing
# legends usually need some further specific adjustments
p.myplot + theme(
legend.position="bottom",
legend.margin=grid::unit(1, "cm")) +
guides(
color=guide_legend("Temperature", nrow=2,
title.position="top", byrow=TRUE))
## Warning: `legend.margin` must be specified using `margin()`. For the old
## behavior use legend.spacing
# use combination of geoms and specific stat for bin calculation
# note: values from stat-calculations can be accessed via ..«parameter»..
ggplot(mtcars) + aes(x=factor(gear)) + geom_linerange(stat="count",ymin=0,size=0.5,color="blue",aes(ymax=..count..)) + geom_point(stat="count",size=3,color="blue")+geom_text(stat="count",vjust=-0.8, color="blue",aes(label=..count..)) + coord_flip()+theme_bw()
latticebars <- function(color="blue") {
p1 <- geom_linerange(stat="count",ymin=0,size=0.5,color=color,aes(ymax=..count..))
p2 <- geom_point(stat="count",size=3,color=color)
p3 <- geom_text(stat="count",vjust=-0.8, color=color,aes(label=..count..))
return(list(p1,p2,p3,coord_flip(),theme_bw()))
}
# create a lattice like barplot with default color
ggplot(mtcars) + aes(x=factor(gear)) +
latticebars()
# easily change the color of the plot
ggplot(mtcars) + aes(x=factor(gear)) +
latticebars("red") +
xlab("Type of Gear\n") + ylab("\nNumber of Items")
Ggplot2 is based on the grid-package. Therefore, it is possible to fit multiple plots on the same page using the viewports functionality.
Further information about using viewports is available in the grid-package documentation.
# we are going to need the grid package
require("grid")
# convenience function to create multi-plot setup (nrow, ncol)
vp.setup <- function(x,y){
# create a new layout with grid
grid.newpage()
# define viewports and assign it to grid layout
pushViewport(viewport(layout = grid.layout(x,y)))
}
# convenience function to easily access layout (row, col)
vp.layout <- function(x,y){
viewport(layout.pos.row=x, layout.pos.col=y)
}
# define three plots to be displayed together
# p.a <- qplot(mpg, wt, data=mtcars, geom="point") + theme_bw()
pa <- ggplot(data=mtcars,aes(x=mpg,y=wt))+geom_point()+theme_bw() # full ggplot for above
# p.b <- qplot(mpg, wt, data=mtcars, geom="bar", stat="identity")
pb <- ggplot(data=mtcars, aes(x=mpg, y=wt))+geom_bar(stat="identity")
p.c <- qplot(mpg, wt, data=mtcars, geom="step")
pc <- ggplot(data=mtcars, aes(x=mpg, y=wt))+geom_step()
# setup amulti plot layout with grid (2x2 fields)
vp.setup(2,2)
# plot all graphics into our layout
print(pa, vp=vp.layout(1, 1:2))
print(pb, vp=vp.layout(2, 1))
print(pc, vp=vp.layout(2, 2))